Goto

Collaborating Authors

 dynamic inference


Dynamic Inference with Neural Interpreters

Neural Information Processing Systems

Modern neural network architectures can leverage large amounts of data to generalize well within the training distribution. However, they are less capable of systematic generalization to data drawn from unseen but related distributions, a feat that is hypothesized to require compositional reasoning and reuse of knowledge. In this work, we present Neural Interpreters, an architecture that factorizes inference in a self-attention network as a system of modules, which we call . Inputs to the model are routed through a sequence of functions in a way that is end-to-end learned. The proposed architecture can flexibly compose computation along width and depth, and lends itself well to capacity extension after training. To demonstrate the versatility of Neural Interpreters, we evaluate it in two distinct settings: image classification and visual abstract reasoning on Raven Progressive Matrices. In the former, we show that Neural Interpreters perform on par with the vision transformer using fewer parameters, while being transferrable to a new task in a sample efficient manner. In the latter, we find that Neural Interpreters are competitive with respect to the state-of-the-art in terms of systematic generalization.


1 Overall Response 1 We sincerely thank all the reviewers for their helpful comments and constructive suggestions

Neural Information Processing Systems

We sincerely thank all the reviewers for their helpful comments and constructive suggestions. Actually, the motivation of using attention and distillation differs from their origins. Regarding the comparison with related work. Net[C2] and NestedNet[C3] (results taken from their papers). The proposed method is robust to hyper-parameters.


Neural Substitute Solver for Efficient Edge Inference of Power Electronic Hybrid Dynamics

Zheng, Jialin, Wang, Haoyu, Zeng, Yangbin, Xu, Han, Mou, Di, Li, Hong, Vazquez, Sergio, Franquelo, Leopoldo G.

arXiv.org Artificial Intelligence

--Advancing the dynamics inference of power electronic systems (PES) to the real-time edge-side holds transformative potential for testing, control, and monitoring. However, efficiently inferring the inherent hybrid continuous-discrete dynamics on resource-constrained edge hardware remains a significant challenge. This letter proposes a neural substitute solver (NSS) approach, which is a neural-network-based framework ai med at rapid accurate inference with significantly reduced computational costs. Specifically, NSS leverages lightweight neural networks to substitute time-consuming matrix operation and high-order numerical integration steps in traditional solvers, which transforms sequential bottlenecks in to highly parallel operation suitable for edge hardware. Experimental validation on a multi-stage DC-DC converter demonstrates that NSS achieves 23x speedup and 60% hardware resource reduction compared to traditional so lvers, paving the way for deploying edge inference of high-fidelity PES dynamics.


DeeR-VLA: Dynamic Inference of Multimodal Large Language Models for Efficient Robot Execution

Neural Information Processing Systems

Multimodal Large Language Models (MLLMs) have demonstrated remarkable comprehension and reasoning capabilities with complex language and visual data.These advances have spurred the vision of establishing a generalist robotic MLLM proficient in understanding complex human instructions and accomplishing various embodied tasks, whose feasibility has been recently verified \cite{rt-2,rt-x}.However, developing MLLMs for real-world robots is challenging due to the typically limited computation and memory capacities available on robotic platforms. In contrast, the inference of MLLMs usually incorporates storing billions of parameters and performing tremendous computation, imposing significant hardware demands.In our paper, we seek to address this challenge by leveraging an intriguing observation: relatively easier situations make up the bulk of the procedure of controlling robots to fulfill diverse tasks, and they generally require far smaller models to obtain the correct robotic actions.Motivated by this observation, we propose a \emph{DynamicEarly-Exit for Robotic MLLM} (DeeR) framework that automatically adjusts the size of the activated MLLM based on each situation at hand. The approach leverages a multi-exit architecture in MLLMs, which allows the model to cease processing once a proper size of the model has been activated for a specific situation, thus avoiding further redundant computation. Additionally, we develop novel algorithms that establish early-termination criteria for DeeR, conditioned on predefined demands such as average computational cost (\emph{i.e.}, power consumption), as well as peak computational consumption (\emph{i.e.}, latency) and GPU memory usage. These enhancements ensure that DeeR operates efficiently under varying resource constraints while maintaining competitive performance.Moreover, we design a tailored training method for integrating temporal information on top of such multi-exit architectures to predict actions reasonably.


Dynamic Inference with Neural Interpreters

Neural Information Processing Systems

Modern neural network architectures can leverage large amounts of data to generalize well within the training distribution. However, they are less capable of systematic generalization to data drawn from unseen but related distributions, a feat that is hypothesized to require compositional reasoning and reuse of knowledge. In this work, we present Neural Interpreters, an architecture that factorizes inference in a self-attention network as a system of modules, which we call functions. Inputs to the model are routed through a sequence of functions in a way that is end-to-end learned. The proposed architecture can flexibly compose computation along width and depth, and lends itself well to capacity extension after training.


Get More at Once: Alternating Sparse Training with Gradient Correction

Neural Information Processing Systems

Recently, a new trend of exploring training sparsity has emerged, which remove parameters during training, leading to both training and inference efficiency improvement. This line of works primarily aims to obtain a single sparse model under a pre-defined large sparsity ratio. It leads to a static/fixed sparse inference model that is not capable of adjusting or re-configuring its computation complexity (i.e., inference structure, latency) after training for real-world varying and dynamic hardware resource availability. To enable such run-time or post-training network morphing, the concept of dynamic inference' ortraining-once-for-all' has been proposed to train a single network consisting of multiple sub-nets once, but each sub-net could perform the same inference function with different computing complexity. However, the traditional dynamic inference training method requires a joint training scheme with multi-objective optimization, which suffers from very large training overhead. In this work, for the first time, we propose a novel alternating sparse training (AST) scheme to train multiple sparse sub-nets for dynamic inference without extra training cost compared to the case of training a single sparse model from scratch.


Neural Search Space in Gboard Decoder

Zhang, Yanxiang, Zhang, Yuanbo, Sun, Haicheng, Wang, Yun, Dou, Billy, Sivek, Gary, Zhai, Shumin

arXiv.org Artificial Intelligence

Gboard Decoder produces suggestions by looking for paths that best match input touch points on the context aware search space, which is backed by the language Finite State Transducers (FST). The language FST is currently an N-gram language model (LM). However, N-gram LMs, limited in context length, are known to have sparsity problem under device model size constraint. In this paper, we propose \textbf{Neural Search Space} which substitutes the N-gram LM with a Neural Network LM (NN-LM) and dynamically constructs the search space during decoding. Specifically, we integrate the long range context awareness of NN-LM into the search space by converting its outputs given context, into the language FST at runtime. This involves language FST structure redesign, pruning strategy tuning, and data structure optimizations. Online experiments demonstrate improved quality results, reducing Words Modified Ratio by [0.26\%, 1.19\%] on various locales with acceptable latency increases. This work opens new avenues for further improving keyboard decoding quality by enhancing neural LM more directly.


Dynamic Inference with Neural Interpreters

Neural Information Processing Systems

Modern neural network architectures can leverage large amounts of data to generalize well within the training distribution. However, they are less capable of systematic generalization to data drawn from unseen but related distributions, a feat that is hypothesized to require compositional reasoning and reuse of knowledge. In this work, we present Neural Interpreters, an architecture that factorizes inference in a self-attention network as a system of modules, which we call functions. Inputs to the model are routed through a sequence of functions in a way that is end-to-end learned. The proposed architecture can flexibly compose computation along width and depth, and lends itself well to capacity extension after training.


Early Classification for Dynamic Inference of Neural Networks

Wang, Jingcun, Li, Bing, Zhang, Grace Li

arXiv.org Artificial Intelligence

Deep neural networks (DNNs) have been successfully applied in various fields. In DNNs, a large number of multiply-accumulate (MAC) operations is required to be performed, posing critical challenges in applying them in resource-constrained platforms, e.g., edge devices. Dynamic neural networks have been introduced to allow a structural adaption, e.g., early-exit, according to different inputs to reduce the computational cost of DNNs. Existing early-exit techniques deploy classifiers at intermediate layers of DNNs to push them to make a classification decision as early as possible. However, the learned features at early layers might not be sufficient to exclude all the irrelevant classes and decide the correct class, leading to suboptimal results. To address this challenge, in this paper, we propose a class-based early-exit for dynamic inference. Instead of pushing DNNs to make a dynamic decision at intermediate layers, we take advantages of the learned features in these layers to exclude as many irrelevant classes as possible, so that later layers only have to determine the target class among the remaining classes. Until at a layer only one class remains, this class is the corresponding classification result. To realize this class-based exclusion, we assign each class with a classifier at intermediate layers and train the networks together with these classifiers. Afterwards, an exclusion strategy is developed to exclude irrelevant classes at early layers. Experimental results demonstrate the computational cost of DNNs in inference can be reduced significantly.


DynamicDet: A Unified Dynamic Architecture for Object Detection

Lin, Zhihao, Wang, Yongtao, Zhang, Jinhe, Chu, Xiaojie

arXiv.org Artificial Intelligence

Dynamic neural network is an emerging research topic in deep learning. With adaptive inference, dynamic models can achieve remarkable accuracy and computational efficiency. However, it is challenging to design a powerful dynamic detector, because of no suitable dynamic architecture and exiting criterion for object detection. To tackle these difficulties, we propose a dynamic framework for object detection, named DynamicDet. Firstly, we carefully design a dynamic architecture based on the nature of the object detection task. Then, we propose an adaptive router to analyze the multi-scale information and to decide the inference route automatically. We also present a novel optimization strategy with an exiting criterion based on the detection losses for our dynamic detectors. Last, we present a variable-speed inference strategy, which helps to realize a wide range of accuracy-speed trade-offs with only one dynamic detector. Extensive experiments conducted on the COCO benchmark demonstrate that the proposed DynamicDet achieves new state-of-the-art accuracy-speed trade-offs. For instance, with comparable accuracy, the inference speed of our dynamic detector Dy-YOLOv7-W6 surpasses YOLOv7-E6 by 12%, YOLOv7-D6 by 17%, and YOLOv7-E6E by 39%. The code is available at https://github.com/VDIGPKU/DynamicDet.